Proceedings of the 2 nd Workshop on NLP and XML ( NLPXML - 2002 )

نویسندگان

  • Laurent Romary
  • Chieko Nakabasami
  • Nancy Ide
  • Graham Wilcock
  • Thierry Declerck
  • Guillermo Barrutieta
  • Joseba Abaitua
  • John Bateman
  • Renate Henschel
چکیده

Content selection is a key factor of anysuccessful document generation system.This paper shows how a content selectionalgorithm has been implemented using anefficient combination of XML/XSLtechnology and the framework of RST fordiscourse modeling. The system generatesmultilingual documents adapted to userprofiles in a learning environment for theweb. This CourseViewGenerator appliessimplified RST schemes to the elaborationof a master document in XML from whichcontent segments are chosen to suit theuser's needs. The personalisation of thedocument is achieved through theapplication of a sequence of filtering levelsof text selection based on the user aspectsgiven as input. These cascading filters areimplemented in XSL. IntroductionIt is widely accepted that content selectionplays a crucial role in text generation (Reiterand Dale 2000). This process is normally seenas a goal-directed activity in which textsegments are fit into the discourse structure ofthe text so as to convey a coherentcommunicative goal (Grosz and Sidner 1986).Content planning techniques, such as textualschemas (McKeown 1985) or plan operators(Moore and Paris 1993), have been successfullyused as models of text generation. There arecases, though, in which these techniques mayface some limitations, for example, when thestructure of the discourse is difficult toanticipate (Mellish et al. 1998). Nevertheless,when a set of well-defined communicativegoals exists, complex goals can be broken downinto sequences of utterances and generationbecomes an efficient "top-down'' process(Marcu 1997).This paper shows a macro level contentselection algorithm that applies user profiles toconstrain and discriminate the contents of atext, whose discourse structure is representedusing a simplified version of RhetoricalStructure Theory (Mann and Thompson 1988).The algorithm has been implemented usingXML/XSL-based technology in a multilingualdocument generation system for educationalpurposes. The main objective of thisCourseViewGenerator system (Barrutieta, 2001and Barrutieta et al., 2001) is to automaticallyproduce multilingual learning documents thatsuit the student's needs at each particular stageof the learning process. Figure 1 shows theoverall architecture of the system. Course material(multilingualparallel corpus)User aspects

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

XML-based NLP Tools for Analysing and Annotating Medical Language

We describe the use of a suite of highly flexible XML-based NLP tools in a project for processing and interpreting text in the medical domain. The main aim of the paper is to demonstrate the central role that XML mark-up and XML NLP tools have played in the analysis process and to describe the resultant annotated corpus of MEDLINE abstracts. In addition to the XML tools, we have succeeded in in...

متن کامل

MUP - The UIC Standoff Markup Tool

Recently developed markup tools for dialogue work are quite sophisticated and require considerable knowledge and overhead, but older tools do not support XML standoff markup, the current annotation style of choice. For the DIAG-NLP project we have created a “lightweight” but modern markup tool that can be configured and used by the working NLP researcher.

متن کامل

A Standoff Annotation Interface between DELPH-IN Components

We present a standoff annotation framework for the integration of NLP components, currently implemented in the context of the DELPH-IN tools1. This provides a flexible standoff pointer scheme suitable for various types of data, a lattice encodes structural ambiguity, intraannotation relationships are encoded, and annotations are decorated with structured content. We provide an XML serialization...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002